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Abstract 

We lift important results of the theory of samples of discrete ergodic 
information sources to the multidimensional setting. We use the technique 
of packings and coverings with multidimensional windows in entropy es- 
timation and universal lossless compression. In particular, we construct 
sequences of multidimensional array sets which, in the limit, build the 
generated samples of any ergodic source of entropy rate below an ho with 
probability 1 and whose cardinality grows at most at exponential rate ho- 
Thereby we extrapolate mathematical framework relevant for universal 
source coding of multi-dimensionally correlated data. 

Keywords: Universal codes, ergodic theory, typical sets, discrete samplings. 



1 Introduction 

The purpose of this paper is to lift results about universally typical sets, typically 
sampled sets and empirical entropy estimation from the usual 1-dimensional 
(discrete time) setting to a multidimensional setting. We start with a short 
description of these concepts and a very brief review of related literature. 



An entropy-typical set is defined as a set of nearly full measure consisting 
of output sequences the negative log-probability of which is close to the en- 
tropy of the source distribution. The scope of this definition is revealed by the 
asymptotic equipartition property (AEP), which is present for a large class of 
processes [9, 3, 1, 10, 2]. The AEP was introduced by McMillan [9] as the con- 
vergence in probability of the sequence — — log f^(x^) to a constant h, namely the 
entropy rate of the process as introduced by Shannon [12]. Roughly speaking 
it implies that the output sequences of a random process are typically confined 
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to a 'small' set T n of events which have all approximately the same probabil- 
ity of being realized, in contrast to the much larger set of all possible output 
sequences. This means that individual outcomes with much higher or smaller 
probability than e~ nh will rarely be observed. In particular, for stationary 
ergodic processes the AEP is guaranteed by the Shannon-McMillan-Breiman 
(SMB) theorem [9, 3]. By the AEP, the entropy-typical sets have total proba- 
bility close to one, and their cardinality is fairly minimal among all sets with 
the latter property. This way, entropy-typical sets provide an important theo- 
retical framework for communication theory. Lossless source coding is a type of 
algorithm which performs data compression while ensuring that the exact re- 
construction of the original data is possible from the compressed data. Lossless 
data compression can be achieved by encoding the typical set of a stochastic 
source with fixed length block codes of length nh. By the AEP, this length nh 
is also the average length needed, cf. [13]. Hence compression at an asymptotic 
rate given by the entropy rate is possible. This is optimal in view of Shannon's 
source coding theorem [12, 4]. 

The extension of the SMB theorem (and the AEP) from discrete time pro- 
cesses Z to amenable groups including the multidimensional setting Z d , by Orn- 
stein and Weiss [10] represented an important progress. It amounts to the theory 
of encoding multidimensional sources. The relation is rather obvious: 
In fact, any (asymptotically) optimal universal compression scheme defines se- 
quences of universally typical sets: for given e, the set of all n d -blocks such 
that their comprimate needs at most (h + e)n d bits, is universally typical for 
all sources with entropy rate h or less. Vice versa, any constructive solution to 
the problem of finding universally typical sets yields an universal compression 
scheme, since the index in the universally typical set is an optimal code for the 
block. As will turn out, our approach is constructive. But one has to admit 
that such an ad hoc algorithm is -generally speaking- not very useful in practice 
because determining the index should be very time consuming. 

In universal source coding, the aim is to find codes which efficiently com- 
press down to the theoretical limit, i.e. the entropy rate, for any ergodic source 
without a need to be adapted to the specific source. We emphasize here that 
codes of that type are optimal data compressors for any stationary source, since 
by the ergodic decomposition theorem (see e.g. [14]) any stationary source is 
a convex mixture of ergodic sources. Many prominent examples of formats for 
lossless data compression, like ZIP, are based on the implementation of the al- 
gorithms proposed by Lempel and Ziv (LZ) LZ77 [5] and LZ78 [6], or variants 
of them, like the Welch modification [15]. The LZ algorithms constitute a uni- 
versal means of constructing libraries. Yet, the LZ algorithms are designed as 
text compression schemes, i.e. for 1-dimensional data sources. 

For multidimensional data, Lempel and Ziv showed [7] that universal coding 
of images is possible by first transforming the image to a 1-dimensional stream 
(scanning the image with a Peano-Hilbert curve, a special type of Hamilton 
path), and then applying the 1-dimcnsional algorithm LZ78. The idea behind 
that approach is that the Pcano-Hilbert curve scans hierarchically complete 
blocks before leaving them, maintaining most of local correlations that way. 
In contrast, a simple row-by-row scan only preserves horizontal correlations. 
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But with the Peano curve approach, while preserving local correlations in any 
non-horizontal direction, too, these correlations are much encrypted due to the 
inevitably fractal nature of that space-filling curve. 

We take the point of view that the techniques of packing and counting can 
be better exploited in data compression with priorly unknown distributions if, 
instead of transforming the 'image' into a dim-l-stream by scanning it with a 
curve, the multi-dimensional block structure is left untouched. This will allow 
to take more advantage of multidimensional correlations between neighboring 
parts of data, speed up the convergence of the counting statistics, and in turn 
fasten estimation and data compression tasks. This approach will be carried 
out in a forthcoming paper. The idea of the present paper is to extend relevant 
theoretical results about typical sets and universally typical sets to a truly multi- 
dimensional sampling window setting. The proofs of these extensions are guided 
by the discussion of the 1-dimensional situation in Shield's monograph [13]. 

2 Settings 

We consider the ci-dimensional lattice 7L d and the quadrant Z+. Consider a 
finite alphabet A, \A\ < oo and the set of arrays with that alphabet: E = A 7 ^ , 
E + = A z + . We define the set of n- words as the set of nx • • • xn arrays E™ := A An 
for the n-box A„ :— {(ii, ■ ■ ■ , id) € 7L\ : < ij < n — 1, j e {1, . . . , d}}. An ele- 
ment x n e E n has elements x n (i) e A for i e A„. 

Let 2l z denote the cr-algebra of subsets of E generated by cylinder sets, i.e. 
sets of the following kind: 

[y] := {x e E : a;(i) = y(i), i eA} , y E A A , A finite. 

If C is a subset of A A , we will use the notation [C] for U ye c[y]- 

We denote by a r the natural lattice translation by the vector r € Z d acting 
on S by cr r a;(i) := x(i + r). We use the same notation a r to denote the induced 
action on the set P of probability measures v over (E, 2l zd ): a r v(E) := v{a~ 1 E). 
The set of all stationary (translation- invariant) elements of P is denoted by P s tat , 
i.e. v <G Pstat if = v for each r £ 7L d . Those v <G P s tat which cannot be de- 
composed as a proper convex combination v = Xi^i + \2is2, with v\ 7^ v 7^ 
and v\ , i>2 € P s tat are called ergodic. The corresponding subset of P s t a t is de- 
noted by P or g- Throughout this paper /i will denote an ergodic ,4-process on E. 
By v n we denote the restriction of the measure v to the block A„, obtained by 
the projection II„ : x e E — > x n e E" with x n (V) — x(i),i G A„. We use the 
same notation 11^ to denote the projections from E™ to E fe , n > k, defined in the 
same obvious way. The measurable map n„ transforms the given probability 
measure v to the probability measure denoted by v n . 

The entropy rate of a stationary probability measure v is defined as limit of 
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the scaled n-word entropies: 

H{u n ):=- ^(W)log^(W) 

h(v) := lim ^H(u n ). 
Here and in the following we write log for the dyadic logarithm log 2 . 

For a shift p £ A& we consider the following partition of Z d into k-blocks: 
Z d = |J (A fe + r + p), 

rEk-Z d 

and in general we use the following notation: 

The regular k-block partitions of a subset M C Z d are the families of sets 
defined by 

&M.k ■= {RmMp) : P e A fe }, R M ,k(p) ■= i( A k + P + r) n M} rek . zd . 
Clearly, for any p the elements of R.M,k{p) are disjoint and their union gives M. 

In the case M — A n , given a sample x n £ E n , such a partition yields 
a parsing of x n in elements of ^4( A fc+ r +p) nA " j r g . We call those el- 
ements the words of the parsing of x n induced by the partition J?A n ,fc(p)- 
With exception of those r, for which A& + r + p crosses the boundary of A n , 
these are cubic fc-words. Forgetting about their r-position, we may identify 

n Afc x ~ n Afc+r cr-ra; e ^ Afc+r = -4 Afc . 

For A;, n £ N, fc < n, any element i£S gives rise to a probability distribu- 
tion, defined by the relative frequency of the different k- words in a given parsing 
of x n . Let us introduce the following expression: 

Z*' k ' n (a) : = ]T l [a] (a k . I+p x), 

rexf =1 {0,...,L(n-p»)/fcJ-l} 

n £ N, k < n,a £ A Ak ,p = (pi,. . . ,p d ) £ A fe . 

For regular, fc-block parsings, the non- overlapping empirical k-block distribu- 
tion generated by x € £ in the box A„ is defined as the probability distribution 
on E fe given by: 

/#"(W) := — )-rf fc,n (a) for «eA (1) 

Similarly, for any p = (pi, . . . ,pa) £ A& the shifted regular /c-block partition 
gives a non-overlapping empirical fc-block distribution: 

# ,M (W) ■= ~d — 77^ ^^'"'"W- ( 2 ) 

IL=i l(n- Pl )/k\ 

Furthermore, we define the overlapping empirical k-block distribution, in 
which all fc-words present in x are considered: 

fer»(W) ■■= (n _ 1 k + 1)d E Ma^rX) for a £ A Ak . (3) 



4 



Remember here the definition of [a] . Observe that all three empirical distri- 
butions only depend on the values of x in the positions A„, i.e. on x n := U n x e 

3 Results 

The main contribution of this paper is the following: 

Theorem 1 (Universally typical sets) For any given h > there exists a 
sequence of subsets {^ n (ho) C such that for all \i e P erg with h(p) < ho 

the following holds: 

a) lim [i n (^n(ho j) = 1, 

n— »oo 

b) lim = ho. 

n— >oo " 

Furthermore, for any sequence C S™} with lim inf -\ log | \ < ho 
there exists a \x € P er9 with h(fi) < ho such that: 

c) liminf n(W n ) = 0. 

The proof of this result is based on other assertions following now. We start 
lifting the packing lemma from [13], which will allow us to use the proof's 
strategy of the 1-dimcnsional statement. The packing lemma states that if a set 
of words C C S m is typical among all m-blocks present in a sample x k € E , 
k > m, i.e., C has large probability with respect to the overlapping empirical 
m-block distribution, then, the sample x k can be parsed into non-overlapping 
blocks in such a way, that nearly all words belong to C . 

While in the d = 1 setting the statement is rather evident, for d > 2 it 
is not inmcdiatly clear how a parsing can be chosen, such that it yields many 
matchings with C, and few 'holes'. Our lemma asserts that this parsing can 
be realized through a regular partition. I.e. C receives large probability in the 
non-overlapping empirical distribution of some shift of x. 

Lemma 2 (Packing Lemma) Consider for any fixed < 8 < 1 integers k 
and m related through k > d ■ m/5. Let C C S m and x G S with the property 
that M™oterz(^) > 1 — Then, there exists ape A m such that: a) fi^ m - k {C) > 
1 - 26,' and also b)\Z^ k {C)\ > (1-4S)(|_£J + 2) d . 

Recall the definition of the overlapping empirical m-block distribution. The 
condition fr™^ erl {C) > 1 - S means EreA fe _ m+1 ^c^rX) > (1 - S)(k- m + l) d . 
The result a) 3p € A m : fiP' m ' k (C) > 1 — 28 means that there exists a regular 
m-block partition i?A fc>m (p) € &A k ,m that parses x k in such a way that at least 
a (1 — 2<5)-fraction of the m- words are elements of C. The result b) implies 
that at least a (1 — 4<5)-fraction of the total number of words (this total number 
including non-cubic words at the boundary), are elements of C. This is the case 
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because the boundary non-cubic elements cover only a small volume. For 6 = 
and any k > m the result a) is trivial, since in that case all m-words in x k are 
inC. 

Proof of Lemma 2. Denote by S the set of vectors {r e Afe_ m+ i: 
<j r x is in [C]}. For any p € A m denote by A(p) the number of those r <G S satis- 
fying r = pmod(m). Clearly A(p) = \ZP' m ' k (C)\ is the number of cubic blocks 
in the p-shifted regular m-block partition of which belong to C. Then we 
have EreA fe _ ro+1 l[C](<7rx) = E P eA m a (p) > (1 - 6) (k - m + l) d , by assumption. 
Hence, there is at least one p' e A m for which A(p') > ( - 1 ~ <5 ^^ i m+1 - > . It is easy 
to see that (1 - <5)^+l2l > (1 _ 5 ) ^-^ > (l _ S )% > (1 - 2<J)£. 
Since the maximal number of m-blocks that can occur in R\ k , m (p r ) is {-^) d , 
this completes the proof of a) . For b) observe that the total number of partition 
elements of the regular partition (including the non-cubic at the boundary) is up- 
per bounded by ([^J + 2) d < -± (k + 2mf < (k d + (k + 2m) d - 1 2dm) < 

-^x Y^j=o ^ d ~^ (2dmy < 1 "1-^2,5 — • Here for the second inequality we used 
the estimate 1 — (d — l)y < 1/(1 + > and for the third one the es- 

timate < dP. On the other hand, from the first part we have A(p') = 

\ Z p^(C)\ > (1 - 26)^ and 1 - 26 > ±e# > (1 - AS) , which com- 

pletes the proof. ■ 

Before we continue formulating the results, we give the definitions of entropy- 
typical sets and of typical sampling sets. The latter name is motivated by the 
properties guaranteed by Theorem 5 below. 

Definition 3 (Entropy-typical sets) Let 5 < 5. For some /j, with entropy 
rate h(/i) the entropy-typical sets are defined as: 

C m {5) := (leT: 2 - md{h ^ +s ^ < fi m ({x}) < 2" m<i (>^)-5) } . (4) 

We will use these sets as basic sets for the typical-sampling-sets defined 
below, see Figure 1. 

Definition 4 (Typical-sampling sets) Consider some \i and 6 < \. For 

k > m, we define a typical-sampling set Tk(6,m) as the set of elements in S fe 
that have a regular m-block partition such that the resulting words belonging 
to the \x-entropy typical-set C m — C m {5) contribute at least a (1 — 6)-fraction 
to the (slightly modified) number of partition elements in that regular m-block 
partition. 

k 
rn 



Tk(6,m) := jz e S fc : l [Cm ](o- r+p x) > (1 - 6) ( — ] for some p e A m j . 



(A m +r+p)CAi, 

We fix some a > and assume 6 < log i^i +1 - Also, in the following we 

will choose m depending on k such that m fc ^°°> 00, and lim^oo ^ = 0. As 
we will see, a sequence of sets Tk(6,m) 7 k > with parameters fulfilling these 
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Figure 1: Left: This is an example of a regular m-block parsing of an element 
in Tk(8,m) for d — 2. The shaded blocks contain elements of C m , and fill at least 
a (1 — <5)-fraction of the total volume k 2 . For k S> m the boundary (non-cubic) 
blocks comprise a neglectable volume. Right: Here we visualize for d — 2 and 
some x n £ E™ the parsing which is used for the empirical fc-block distribution fl^- 
A fc-block of i™ belongs to Tk(5, m), if it can be parsed by some (possibly shifted) 
regular m-block partition in such a way that the resulting (non-overlapping) m- 
words belonging to C m {8) cover a (1 — 5)-fraction of all the k 2 sites of that fc-block. 
The non-cubic boundary blocks resulting from the fc-block partition do not affect 
the empirical fc-block distribution /xj: ,n . 



conditions constitutes a sequence of typical-sampling sets Tk{a) (Theorem 5 a)). 

The following theorem is a generalization to d > 1 of a result by Ornstein 
and Weiss in [11], (Theorem II. 3.1 in the monograph of Shields [13]). It en- 
sures the existence of 'small' libraries from which asymptotically almost surely 
the realization of an ergodic process can be constructed, i.e., parsed as words 
belonging to that library. The library is given by the typical-sampling sets of 
Definition 4. Furthermore, it states that smaller libraries do not suffice. 

Theorem 5 Given any \x £ P erg and any a € (0,^) we have the following: 

a) For all k larger than some ko = ko(a) there is a set Tk(oe) C S fe satisfying 

log|7fc(a) 



k d 



< h(p) + a , 



and such that for \i-a.e. x the following holds: 

for all n and k such that — < e for some e — e(a) > and n larger than 
some no(x). 

b) Let {7~k, n (x)}k,n>o be a family of double- sequences of subsets of E fc de- 
pending measurably on x € E, such that |7fe,„(a;)| < 2 fe Then 
there exists a k\(a) > ko(a) and for [i-a.e. x there exists an Uq(x) such 
that 

^ n (f fe ,„(a;))<a, 
for any indices k,n fulfilling k > fci(a),n > uq(x) and 2 k < n d . 
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The above result is closely related to the so called typical sequence theorem, 
(cf. Theorem 1.4.1 in [13]), a consequence of the individual ergodic theorem, 
which says that for an ergodic fi the following limit exists and satisfies the 
equation for almost every x: lim^oo fj^'overi( a k) = MI a fc]) f° r an y k and any 

a k e A Ak . 

The following theorem states that the entropy of the empirical distribution 
of a sample almost surely converges to the true entropy of the process. This 
is an important component of the proof of the existence of universally typical 
libraries of small cardinality, Theorem la), b). 

Theorem 6 (Empirical entropy theorem) Let fi e P erg - Then for any se- 
quence {k n } with k n "^°°> oo and k*(h([i) + a) < logra d (for some a > 0) we 
have ^ 

lim Td H (^x n,n ) = Mm) i M-a-S- 

This concludes the section of results. Below we provide the proofs. 
Proofs 

Proof of Theorem 5a). We show that the claim holds choosing 7fe(a) as 
typical sampling sets Tk (S, m) from Definition 4 with 5 < log |^| +1 , rn fc ^°°> oo 
and limfe^oo f = 0. 

Cardinality. We estimate the cardinality of the sets Tk(5, m). For a given to, 

there are m d possible values of p. There are at most (^) cubic boxes in any 
m-block partition of Therefore, the number of choices for the contents of all 

blocks which belong to C m is at most |C m |^™' 1 . By the definition of Tk(6,m) 
the number of lattice sites not belonging to the regular partition being referred 
in this definition, is at most 5k d . There are at most l^l" 5 * possible choices for 
the contents of those array sites. Set K = [—J +2. The maximal number 
of blocks occurring in the partition (including non cubic ones) is K . For ^ 
small enough, not more than a 28 < a < | fraction of all these blocks have 
contents not in C m . Taking into account that the binomial coefficients (^) do 
not decrease in I while I < \K, we get the following bound: 

\%(5,m)\ < m d Y, PfW fc Vm|(") d 

We apply Stirling's formula A^! ~ \ / 2nN(^-) N , taking into account that the 
multiplicative error for positive N is uniformly bounded from below and above. 
A coarse bound will suffice. In the following estimate we make use of the relation 
\C m \ < 2 m ( h (^+ s \ following immediately from the definition of C m . For some 
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positive constants c,c', and c" we have 



log|T fc (*,m)| < log cm d K d (^^J ^-^— 2 \ A \ Skd \C m \^Y 

< log c'm d 3 Kd K d / 2 \A\ 5kd \C m \^Y 

< log c "fcd3(^+2) d 2 (' l ^)+ l5 +' 51o gl- 4 l) fcd 

< k d (hfa) + S(\og \A\ + 1) + ^ log3 + ± lQgC " 

In the last line we used 1/ra + 2/fc < 2/m, which is fulfilled if k/m is large 
enough. 

Whenever (5 < log and TO as well as k are large enough (depending on 
a) this yields log |7fc(a)| < k d (h(fi) + a). 

Probability bound. The Ornstein- Weiss extension for amenable groups [10] 
of the Shannon-McMillan-Breiman-theorem yields 1 : 

lim — ^-T log/j,(Yl m x) = h(fj,) ,u-a.s. . 

m— >oo rn 

Thus, in view of the definition of C m (Definition 3), there exists an mo(5) such 
that /i m (C m ) > 1 — 5 2 /5 for all m > mo(5). We fix such an m. The individual 
ergodic theorem [8] asserts that the following limit exists for ^i-a.e. x e X: 
lim^^ ^ EreA„ ilCn,] (ova) = / l[C„,](aOd/i(a;) = M m (C m ), and therefore 

^ l [Cm] ((7 r i) > (1 -<5 2 /4)(n-m + l) d > (1 - 6 2 /3)n d (5) 

r£A„_ m+ i 

holds eventually almost surely, i.e. for ^-almost every x, and choosing n large 
enough depending on x, n > n (x). 

Take an x e E and an n e Z + for which this is the case. Choose a fc with 
m < k < n. Consider the unshifted regular fc-block partition of the n-block A„: 

A„ = (J (A fe + r)n A„. 

r£fe-Z d 

Now, from equation (5) we deduce, that, if k/m and n/k are large enough, at 
least a (1 — 2<5)-fraction of the elements of this regular fc-block parsing of H n x 
which do not cross the boundary of A„ (those which count for the empirical 
distribution fb k , ,n , i.e. Hk<J r x with re k ■ Z d n A„_fc + i) satisfy 

(k - m + lY E Mc m] (o- s+r x)> (1-6/4). (6) 

seA fc _ m+ i 

This is because if more than the specified 2<5-fraction of the /c-blocks had more 
than a 5/4- fraction of 'bad' m-blocks, then the total number of 'bad' m-blocks 



1 In fact we only need the convergence in probability, which ensures /i(C m ) 



9 



in Tl n x would be larger than 



for - and ^ small enough, in contradiction to equation (5). While n had to 
be chosen large enough depending on a;, we see that k needs to be chosen such 
that — and ^ are both small enough. 

n k & 

By Lemma 2 if k > 4dm/ S, the /c-blocks which satisfy equation (6) have a 
regular m-block partition with at least a (1 — <5)-fraction of all partition members 
in C m . Hence, at least a (1 — 2<5)-fraction of all fc-blocks in A„ counting for the 
empirical distribution belong to Tk(S,m). For 25 < a we get the probability 
bound: 

il k x n (Tk(S,m))>l-a. (7) 
This completes the proof of Theorem 5 a). ■ 

Proof of Theorem 5b). The statement is trivial for h(ji) = 0. Let 
h(n) > 0. 

For fixed S < a, consider the sets E n (6) of all x in E with the property 
££"(7* (5)) > 1 - <5 for all fc > k (S), 2 fed C^)+«) < n d 

where ko = ko(S) is chosen large enough corresponding to the first part of the 
theorem. 

Consider the sets D n (a,6) of all x in E with the property 

A& n (7fc,n(a0) > oi for some k with k > k (S),2 kd ^ +a ^ < n d . 
Remember the definition of entropy-typical sets: 

C n {5) = E" : 2- nd< - h ^ +s ^ < fi n {{a}) < 2-^^-^. 
Finally, set F n (5,a) = [C n (8)} n D n (a,5) n E n {5). 

The restriction of any x in D n (a,S) n E n {8) to A n , i.e. a := IT n a; can be 
described as follows. 

1. First we specify a k with k > k (S),2 k < n d as in the definition 
of D n (a,5). 

2. Next, for each of the |_^J blocks counting for the empirical distribution, 
we specify whether this block belongs to 7fc,„(x), to Tk(S) \ Tk, n (x) or to E fe \ 

(Tk(S)ufk, n (x)). 

3. Then we specify for each such block its contents, pointing cither to a list 
containing all elements of Tk, n {x), or to a list containing Tk{S) \ Tk, n {x) or, in 
the last case, listing all elements of that block. 

4. Finally, we list all elements (at the boundary) not covered by the empirical 
distribution. 

In order to specify k we need at most logn bits (in fact, much less, due to 
the bound on k). We need at most 2 [^J bits to say which of the cases under 
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2. is valid for each of the blocks. For 3. we need the two lists for the given k. 



This needs at most 



r 2k d (hM+6) + 2 k d (h(n)- a )^ fc d(i og |_4| + !) bits . According 

to the definitions of D n (a, 5) and E n (5), to specify the contents of all /c-blocks, 
we need at most 

(I! + ifk d (a(h(n) -a) + (l- a){h{n) +5)+ <5(log \A\ + 1)) 

bits. For 4. we need at most (n d — |_^J d k d )(\og \A\ + 1) bits. Hence the 
cardinality of H n F n (S, a) can be estimated by 

\og\U n F n (5,a)\ 
< log n + 2 



n d 



k d (a) 

+n d (n~ d ( 1 -*M+^> + n^ 1- mTo+^J log L4 + 1 

V J h(u) + a 



(logL4| + l) 




< n d (h( f i)-a 2 /2 + S(\og\A\+2)) 

bits, supposed n is large enough and fci(a) is chosen sufficiently large. Now, 
due to H n F n (S, a) c C n (S), we get 

»(F n (S,a)) = f, n (U n F n (5,a)) < 2 -« d (« 2 /2-*(iog |-4|+3))_ 

Making 5 small enough from the beginning, the exponent here is negative. 
Hence, by the Borel-Cantelli lemma, only finitely many of the events x e 
F n {8 1 a) may occur, almost surely. But we know from the first part of the the- 
orem that x € E n (S) eventually a.s. (observe that the condition 2 kd ^ h ^ +a " > < 
n d implies - < e(S) as supposed there, for n large enough). And we know 
from the Ornstein- Weiss- Theorem that H n x <G C n (5) eventually a.s. Hence 
x e (S \ F n (S, a)) n E n (S) n [C n (S)j CS\ D n (5, a) eventually a.s. 

This is the assertion b) of the theorem. ■ 

Proof of Theorem 6. The proof follows the ideas of the proof of the 
one-dimensional statement Theorem II. 3. 5 in [13]. 

Let a < j and consider the sets T k (a) given in theorem 5. Define U k ^ n {x) := 
{a e Tfc(a) : Ax'"(«) < 2- kd{h ^+ 2a ^}. We have \T k (a)\ < 2 kd{h ^+ a \ From 
this we deduce p, k ' n (U k ,„(x)) < 2~ k a for any x. 

Consider also the sets V k ,„{x) := {a e T k (a) : A#"(a) > 2- fcd (' l (^)- 2Q )}. 
Then obviously H4,„(x)| < 2 k " ( - h ^)-' 2a ) . Now the second part of Th. 5 states 
that for ^-a.e. x there exists an n (x), such that p, k ' n (V kyn (x)) < 2a, supposed 
n > n (x),k> fci(2a) and 2 fcd ( /l ^)+ 2Q ) 

We define M k ^ n (x) := T k {a) \ (U kin (x) U V kin (x)), and conclude that for fi-a.e. 
x the following holds 

/& n (M fc , n (aO) > 1 - 4a, 
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supposed n > n (x),k > k 2 {2a) and 2 k ° i ( h (i J -)+ 2a ) < n d i where k 2 {a) > k\{a) is 
chosen such that 2- k2( - a ^ a < a. 

Consider now the definition of the Shannon entropy of the empirical distri- 
bution 

aS£ fc 

=- £••■-£-■ 

S fc \M fc ,„ M fc ,„ 

We write := S fe \ „(x). For the first sum an upper bound is given 

by 2 

5 fe ,„<^'"(i? fc ,n(x))fc d log|^|-^4B fc ,„(x))log^„(i?^ l (x)). 
Hence limsupp-S fc („) „ < 4ct log |^4| holds /z-almost surely under the as- 

n— foo n 

sumptions of the theorem. 

As for the second sum, bear in mind that the elements a in Mk. n (x) have 
the property 

k d {h{n) -2a)<- log Ax'"(o) < k d (h(p) + 2a) 

and thus 

^Xk,n> £ ^ n (a)(h(^-2a)>(l-4a)(h^)-2a) 

™ aGM fe , n (x) 

pXfc,„< £ A"'"(a)(MM) + 2a)<MM) + 2a. 

™ aeM fc ,„(x) 

Therefore we have 

(l-4a)(ft(/z)-2a) 

< Urn inf-^ff (£«">■") 

< limsup^if (/#">■") 

n— i-oo & n 

< /i( M ) + a(2 + 41og|.4|) 

holding jti-a.s. 

Finally, observe that a sequence k n fulfilling the two assumptions of the the- 
orem for some a > in fact fulfils them for any smaller a too. This proves the 
result. ■ 

Proof of Theorem 1. 1. Each x € £ gives rise to a family of empirical 
distributions {p, k ' n } k<n . 



2 Observethat J2 aeB p(a) log p(o) < p(B) log \B\ - p(B) logp(B). 



12 



We define for each n the set =^(/io) as the set of elements in E" having 
empirical fc-block entropy per symbol not greater than h®: 

X(ho) := Tin {x G S : H (& n ) < k d h } . 

Here we have to choose k depending on n, (how exactly will be specified 
later). 

The number of all (non-overlapping) empirical fc-block distributions in E™ is 

\A\ kd 

upper bounded by (^(j) d ^j , since [j\ d is the maximum number of occur- 

rences of any particular /c-block in the parsing of an element of E™, and |^4| 
is the number of elements in E fe . 

For the number of elements x n G E™ which give rise to the same empirical 
distribution (/I*'™) we find an upper bound which depends only on the entropy 
of that empirical distribution: 

For a given n such that [n/k\ = n/k we consider the product measure 
P = (£M)®Wfe) d on E": P(y n ) = \\ rek . z * fi k x < n (TL k (a r y)), which yields 

A fc +rCA„ 

P(y n )= II (^■"(a)) (B/ * )- ' iS, " (o) = 2-(-/*)Mrt-) > V „ : #» = (8) 

aS£ fc 

and thus \{y G E" : ft n = jl k x ' n }\ < 2(™/fe)ff(^' n ). 

For a general n : [n/k\ ^ n/k, the entries in the positions A n \A fe .|^„/ fc j of 
an y G E™ may be occupied arbitrarily, giving the following bound: 

\{y G E" : jly' n = j^ n }\ < 2L"/ fe J dff (^' n ) • \ A \n d -(n-k) d (9) 

Now we are able to give an upper estimate for the number |5^(/io)| of all 
configurations in A„ which produce an empirical distribution with entropy not 
larger than k d h : 

\X(h )\ < 
log |^(ft, )| < 

Introducing the restriction k d < j^logj^i n d = jj^§y^g \ a\ > £ > ® arbitrary, 
we conclude that |^(/io)| < 2" d/l0+o (™ t! ) (uniformly in k under the restriction). 
This yields limsup log l ^" d {ho)l = h . 

n— >oo 

2. Next we have to prove that such a sequence of sets, with k = k(n) suit- 
ably specified, is asymptotically typical for all fi G P or g with h(fi) < h n . Given 
any /x with h{a) < ho, Theorem 6 states that for /i-a.e. x the fc(n)-block em- 
pirical entropy of jl k , ,n converges to h{n), provided that k(n) is a sequence with 
k(n) — s- oo and k(n) d < a , where a > can be chosen arbitrarily. Since 

h(fi) < log |^4j , choosing k d (n) < prpf^i \A\ > £ > ® arbitrary, we get assertion 
a) from the definition of ^ n (ho). 



\A\" 



n d h + (n d - (n - k) d ) log \A\ + \Af dlog p 



^(t)Vr d - ( "- fc)d (Q 
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3. For a sequence C £"}„ with liminfn^oo -^\ogYfr n \ = hi < ho we 
find a /x with h(fi) = hi,h\ < h 2 < ho- We know, that fi n is asymptotically 
confined to the entropy typical subsets: 

C n {8) := 2 - nd{h2+S) < M™({a}) < 2-" d(/l2 - 5) } . 

Hence, we get the following: 

liminf/j(^„) = liminf Ai (^„nC„((5)) < liminf \W n \2- nd ( h *- s) = lim 2 nd(hl - h2+s K 

Choosing <5 small enough, this is zero. This proves c). Also, combining c) with 
a), we get liminf ^ log \&n(ho)\ > h . In 1. we proved lim sup^ log |^(/i )l = 

n— s-oo n—>oo 

/io, thus &J is verified. ■ 



4 Conclusions 

We have formulated and shown multidimensional extensions of important the- 
oretical results about samplings of ergodic sources. Since these results give a 
mathematical basis for the design of universal source coding schemes, we here- 
with provide a truly multidimensional mathematical framework for the optimal 
compression of multidimensional data. 

We have shown that the set of n x • • • x n-arrays which have empirical 
fc-block distributions of per site entropy not larger than ho is asymptotically 
typical for all ergodic A- processes of entropy rate smaller than ho, where k = 
^/clog^i n d J , < c < 1. In other words, for all ,4-proccsscs of entropy rate 
smaller than ho the probability of the corresponding cylinder set tends to 1 as 
n — > oo. These sets have a log cardinality of order n d h - 
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